Reducing the Performance Impact of Instruction Cache Misses
نویسندگان
چکیده
Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the Abstract In conventional processors, each instruction cache fetch brings in a group of instructions. Upon encountering an instruction cache miss, the processor will wait until the instruction cache miss is serviced before continuing to fetch any new instructions. This paper presents a new technique, called out-of-order issue, which allows the processor to temporarily ignore the instructions associated with the instruction cache miss. The processor attempts to fetch the instructions that follow the group of instructions associated with the miss. These instructions are then decoded and written into the processor's reservation stations. Later, after the instruction cache miss has been serviced, the instructions associated with the miss are decoded and written into the reservation stations. (We use the term issue to indicate the act of writing instructions into the reservation stations. With this technique, instructions are not written into the reservation stations in program order. Hence, the term out-of-order issue.) In this paper, we introduce the concept of out-of-order issue, describe its implementation , and present some initial data showing the performance gains possible with out-of-order issue.
منابع مشابه
Effect of Node Size on the Performance of Cache-Conscious Indices
In main-memory environments, the number of processor cache misses has a critical impact on the performance of the system. Cache-conscious indices are designed to improve the performance of mainmemory indices by reducing the number of processor cache misses that are incurred during a search operation. Conventional wisdom suggests that the index’s node size should be equal to the cache line size ...
متن کاملExploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions
As the degree of instruction-level parallelism in superscalar architectures increases, the gap between processor and memory performance continues to grow requiring more aggressive techniques to increase the performance of the memory system. We propose a new technique, which is based on the wrong-path execution of loads far beyond instruction fetch-limiting conditional branches, to exploit more ...
متن کاملTemporal-Based Procedure Reordering for Improved Instruction Cache Performance
As the gap between memory and processor performance continues to grow, it becomes increasingly important to exploit cache memory effectively. Both hardware and software techniques can be used to better utilize the cache. Hardware solutions focus on organization, while most software solutions investigate how to best layout a program on the available memory space. In this paper we present a new l...
متن کاملUnderstanding the Backward Slices of Performance Degrading Instructions Craig
For many applications, branch mispredictions and cache misses limit a processor's performance to a level well below its peak instruction throughput. A small fraction of static instructions, whose behavior cannot be anticipated using current branch predictors and caches, contribute a large fraction of such performance degrading events. This paper analyzes the dynamic instruction stream leading u...
متن کاملThe Effect of Executing Mispredicted Load Instructions in a Speculative Multithreaded Architecture
Concurrent multithreaded architectures exploit both instructionlevel and thread-level parallelism in application programs. A single-threaded sequencing mechanism needs speculative execution beyond conditional branches in order to exploit more instruction-level parallelism. In addition, an aggressive multithreaded architecture should also use thread-level control speculation in order to exploit ...
متن کاملThe Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors
ÐCurrent microprocessors incorporate techniques to aggressively exploit instruction-level parallelism (ILP). This paper evaluates the impact of such processors on the performance of shared-memory multiprocessors, both without and with the latencyhiding optimization of software prefetching. Our results show that, while ILP techniques substantially reduce CPU time in multiprocessors, they are les...
متن کامل